# A 36-Gb/s 2× Half-Baud-Rate Adaptive Receiver in 28-nm CMOS Yi-Hao Lan and Shen-Iuan Liu<sup>10</sup>, Fellow, IEEE Abstract—A 36-Gb/s $2\times$ half-baud-rate (THBR) adaptive receiver (RX) is presented. The pattern-based adaptation method for adjusting the frequency response of a continuous-time linear equalizer (CTLE) is proposed. In addition, the reference voltage of the comparators is adapted to enhance the timing margin of the recovered clock in the RX. This THBR adaptive RX is fabricated in TSMC 28-nm CMOS technology with a core area of 0.097 mm². The measured bit error rate (BER) is less than $10^{-12}$ for a 36-Gb/s pseudorandom binary sequence (PRBS) of $2^7-1$ , when the channel loss is 19 dB at 18 GHz. The total power consumption of this RX is 76 mW with gated adaptation circuits. The calculated figure of merit (FoM) is 2.1 pJ/bit. Index Terms—2× half-baud-rate (THBR), adaptive receiver (RX), comparator, continuous-time linear equalizer (CTLE), reference voltage. ## I. INTRODUCTION N LIGHT of massive volume of data traffic in the data center for various services, the data rate is increasing exponentially. The baud-rate (BR) technique [1] reduces the power consumption of the clock/data recovery (CDR) circuits in order to improve the energy efficiency of the data transceivers. However, the BR technique [1] necessitates a special data eye diagram shaped by the continuous-time linear equalizer (CTLE) and comparators with correct reference voltages in order to retime and recover the lossy data in one sample per UI. To consider the process variations and the low enough bit error rate (BER), the equalizer adaptation is important for the BR receiver (RX). Yoo et al. [2] and Chen et al. [3] present various adaptive engines to adapt the reference voltages of the comparators or/and the CTLE for the BR CDR circuits. These efforts enhance the timing margin and the jitter tolerance (JTOL) under a BER of $10^{-12}$ . A 2× half-BR (THBR) CDR circuit [4] is presented which reduces the bandwidth requirement of the CTLE and improves the energy efficiency but at the expense of eye height. The data swing is considered [4] with a full swing in two UIs while only the first postcursor and precursor. In addition, the Manuscript received 4 January 2024; revised 26 March 2024; accepted 18 April 2024. Date of publication 29 April 2024; date of current version 28 June 2024. This work was supported in part by ASMedia Technology Inc.; in part by the Intelligent and Sustainable Medical Electronics Research Fund at National Taiwan University; and in part by the National Science and Technology Council, Taipei, Taiwan. (Corresponding author: Shen-Iuan Liu.) The authors are with the Graduate Institute of Electronics Engineering and the Department of Electrical Engineering, National Taiwan University, Taipei 10617, Taiwan (e-mail: lsi@ntu.edu.tw). Color versions of one or more figures in this article are available at https://doi.org/10.1109/TVLSI.2024.3392680. Digital Object Identifier 10.1109/TVLSI.2024.3392680 top and bottom reference voltages of the THBR CDR circuit may limit the maximum amplitude of the sinusoidal jitter (SJ) of the input data. Without the proper reference voltages of the comparators, the THBR CDR circuit may fail in the worst-case scenario. For example, the passive components of the CTLE and its reference generator may vary under process variations. Consequently, the CTLE and the reference voltage of the comparators must be adapted for the THBR CDR circuit. In addition, the equalized frequency response up to 1/2-2/3 Nyquist frequency ( $f_{\rm NQ}$ ) is necessary to reduce the intersymbol interference (ISI) for the THBR CDR circuit. To adjust the CTLE parameters, the sequential search [2], the spectrum balancing method [5], the genetic adaptation algorithm [6], and the BER-based method [7] have been introduced. The sequential search in [2] adjusts the degenerated capacitor of the CTLE to minimize the residual ISI. It leaves the first postcursor, which can be processed by the one-tap DFE. However, when the degenerated resistor varies under the process variations, the flat frequency response of the CTLE is not guaranteed if only the degenerated capacitor is adapted. In addition, Yoo et al. [2] require the power-hungry phase interpolator (PI). The spectrum balancing method in [5] is sensitive to the data with different patterns. For a low BER, the adaptation time of [6] and [7] will be lengthy. To adjust the reference voltages of the BR CDR circuit, an eye-balancing calibration [3] is presented to enhance the high-frequency JTOL. However, an additional clock is required which degrades the energy efficiency. To reliably operate the phase detection and data recovery, the eye diagram of the quarter-rate THBR CDR circuit [4], [7] must contain only the first precursors and postcursors, while the remaining high-order ISI terms are minimized. However, while the THBR RX [4], [7] is powered on, the initial CTLE parameters may not be properly set. Since the channel loss is not correctly compensated, the zero-crossing points will be perturbed by data-dependent jitter (DDJ) which degrades the accuracy of the phase detector (PD). If the eye diagram is closed due to the ISI, the PD may fail. Alternatively, even both the eye height and the eye width have been adapted but the reference voltages are not proper (e.g., close to 0), the PD cannot work well. In addition, the threshold voltage of the comparators in the PD is also adjusted by using the adaption method. In this work, two contributions are presented. One is to adapt the CTLE by using a pattern-based method. It allows the THBR CDR circuit to work even the CTLE is not well adapted in the initial power-on status. The other is to adapt the reference voltages of the PD in the THBR CDR circuit [4], [7] which improves the timing margin of the recovered clock. The first step is to achieve the well-equalized frequency response of the CTLE with the channel up to $f_{\rm NQ}/2$ . The second step aims to extend the overall frequency response up to $2f_{\rm NQ}/3$ . In addition, the reference voltage of the comparators is adjusted to improve the timing margin of the recovered clock. This pattern-based adaptation method does not require the PI [2] or an additional multiphase clock [3]. Consequently, this THBR adaptive RX can work without adjusting the parameters manually and improve the energy efficiency. This article is organized as follows. Section II describes the adaptation methods for the CTLE and the reference voltage of the comparators. In Section III, the circuit description is presented. Section IV presents the experimental results. The conclusion is given in Section V. ## II. PATTERN-BASED ADAPTATION # A. Review of THBR CDR Circuit [7] Fig. 1(a) shows the eye diagram of the received data. For four consecutive data D[k]-D[k-3], where k is the time index, the quarter-rate THBR CDR circuit [7] utilizes eight comparators with the threshold voltages $+V_{\rm H}$ , 0, and $-V_{\rm H}$ as well as four quarter-rate clocks CK<sub>0</sub>, CK<sub>45</sub>, CK<sub>135</sub>, and $CK_{180}$ , where $CK_x$ is the clock with the phase degree of x. Note that CK<sub>225</sub> is used in [4] instead of CK<sub>135</sub>. By using CK<sub>135</sub> instead of CK<sub>225</sub> in this work, the number of retimers can be reduced [7]. The edge of $CK_0$ is aligned with the data transition between the data D[k-2] and D[k-3], when the quarter-rate THBR CDR circuit [7] is locked. The edge of CK<sub>180</sub> is aligned with the data transition between the data D[k-1] and D[k]. In addition, $CK_{45}$ and $CK_{135}$ align the centers of D[k-2] and D[k-1], while D[k] and D[k-3]are not sampled. To detect whether the clocks CK<sub>0</sub> and CK<sub>180</sub> are running early or late, the phase detection truth table is shown in Fig. 1(b). Using the comparators, $CK_0$ and $CK_{45}$ , the sampled data $[DH_0, DL_0, DM_{45}] = [1, 0, 1]$ are true if a data transition is rising. The edge sampled data $F_0$ then determines whether the clock $CK_0$ is early or late, as shown in Fig. 1(b). Using the comparators, $CK_{135}$ and $CK_{180}$ , the edge sampled data ED<sub>180</sub> can determine whether the clock CK<sub>180</sub> is early or late when the sampled data $[DH_{180}, DL_{180}, DM_{135}] = [1,$ 0, 1]. Using DM<sub>135</sub> and DM<sub>45</sub>, the data D[k-1] and D[k-2] can be recovered, respectively. To recover the data D[k] and D[k-3], the data recovery truth table is shown in Fig. 1(c). The adaptation methods for the CTLE and the reference voltage are discussed below. ## B. Adaptation Method The proposed adaptation method can adjust the CTLE parameters by a two-step process. In the first step, the DDJ may severely perturb the edge sample data $\mathrm{ED_0}$ and $\mathrm{ED_{180}}$ . The patterns 0011, 1100, and 001100 are first selected to flatten the frequency response of the CTLE with the channel up to $f_{\mathrm{NQ}}/2$ . For patterns 0011 and 1100, their fundamental frequency is about $f_{\mathrm{NQ}}/2$ . In the second step, patterns 1101 and 1011 are chosen to further extend the overall bandwidth. Fig. 1. (a) Eye diagram sampled by four multiphase clocks and eight comparators, and the truth tables for (b) phase detection and (c) data recovery. Fig. 2. Normalized amplitude of the third bit of patterns 0010 and 0011 using seven traces of the Keysight M8049A ISI board. Initially, the eye diagram is closed due to the improper CTLE parameters and the high channel loss. The THBR PD may not work correctly. For example, while the rising data does not transit across 0 and $[DH_0DL_0\ DM_{45}] = [0\ 1\ 0]$ is obtained, the PD may output an opposed result. To address the above issue, let us consider seven traces on the Keysight M8049A ISI board at 36-Gb/s NRZ, and their S-parameters are extracted by a vector network analyzer (VNA) for simulations hereafter. For patterns 0011 and 0010, Fig. 2 shows the simulated amplitude of the third bit normalized by the peak amplitude of the single-bit response (SBR). Once the channel loss at 18 GHz exceeds 15.8 dB, the normalized amplitude of the third bit becomes negative for the pattern 0010. However, it remains positive for the pattern 0011 due to the precursor of "1" at the 4th bit. Even the CTLE parameters are not properly set yet, two special patterns 0011 and 1100 can be detected under a channel loss of 20 dB. For instance, if D[k-4] =0 and $[DH_0, DL_0, DM_{45}] = [1, 0, 1]$ and $[DM_{135}] = [1]$ are captured, $\{D[k-4], D[k-3], D[k-2], D[k-1]\} = \{0, 1\}$ 0, 1, 1} is correctly recovered. Once the valid pattern 0011 is found, the edge sampled data ED<sub>0</sub> is used to update the phasedetecting result; otherwise, it is not updated. In this way, once Fig. 3. Waveform of pattern 001100 at the CTLE output. PD is locked by using the patterns 0011 and 1100, the sampled amplitude of the patterns at $CK_0$ or $CK_{180}$ is satisfied with $$h_{-0.5} + h_{-1.5} = h_{+0.5} + h_{+1.5}$$ (1) where $h_{k5}$ represents the k.5th residual ISI and $k \in \mathbb{Z}$ . Then, a special pattern 001100 is monitored. Fig. 3 illustrates that the waveform of pattern 001100, where $CK_0$ and $CK_{180}$ sample at the first and second waveform transitions, respectively. Two random variables $RV_0$ and $RV_{180}$ represent the sampled amplitudes by $CK_0$ and $CK_{180}$ , respectively. According to (1), the difference between the expectation values of $RV_0$ and $RV_{180}$ is given as follows: $$E[RV_0] - E[RV_{180}] = -h_{-2.5} - h_{-3.5} + h_{+2.5} + h_{+3.5}.$$ (2) Once (1) is true, the edge sampled data ED<sub>0</sub> and ED<sub>180</sub> of pattern 001100 are only affected by $h_{\pm 2.5}$ and $h_{\pm 3.5}$ . For pattern 001100, if $E[RV_{180}] \neq E[RV_0]$ , the waveform of Fig. 3 is asymmetric. If $E[RV_0] > E[RV_{180}]$ and $E[RV_0] < E[RV_{180}]$ , the CTLE's zero is descended and raised to increase and decrease the boosting gain, which decreases and increases $h_{+2.5} + h_{+3.5}$ , respectively. $E[\mathrm{RV_0}]$ and $E[\mathrm{RV_{180}}]$ represent the mean zero-crossing voltages sampled by $\mathrm{CK_0}$ and $\mathrm{CK_{180}}$ , respectively. If both $E[\mathrm{RV_0}] < 0$ and $E[\mathrm{RV_{180}}] < 0$ , the sampled amplitudes are simultaneously lower than the zero-crossing threshold. It implies that the boosting factor is insufficient. Similarly, if both $E[\mathrm{RV_0}] > 0$ and $E[\mathrm{RV_{180}}] > 0$ , the boosting factor is over boosting. While both $E[\mathrm{RV_0}]$ and $E[\mathrm{RV_{180}}]$ are close to 0, the frequency response of the CTLE with the channel is compensated up to $f_{\mathrm{NQ}}/2$ . A conventional CTLE [8] is composed of a differential pair with a degenerated resistor $R_{\rm S}$ and a degenerated capacitor $C_{\rm S}$ . The low-frequency gain is reduced by increasing the boosting factor of $(1 + g_{\rm m}R_{\rm S}/2)$ , where $g_{\rm m}$ is the transconductance. It is equivalent to increase the boosting gain. The CTLE's zero is equal to $1/R_{\rm S}C_{\rm S}$ . Moreover, $E[{\rm RV}_0]$ and $E[{\rm RV}_{180}]$ can be estimated by using the edge sampled data ED<sub>0</sub> and ED<sub>180</sub>, respectively. Therefore, the degenerated resistor $R_{\rm S}$ is updated to adjust the boosting gain of the CTLE by $$R_{\rm S}^{n+1}$$ $$=R_{S}^{n}-\Delta_{1}\cdot\operatorname{sign}\left\{\sum_{m=0}^{7}P^{m}\left[\left(ED_{0}^{m}\cdot ED_{180}^{m}\right)-\left(\overline{ED_{0}^{m}}\cdot \overline{ED_{180}^{m}}\right)\right]\right\}$$ where n is an updating index and $\Delta_1$ is a step size. In (3), the control signal $P^m$ is equal to 1, when a valid pattern Fig. 4. Split zero-crossing edges of 1101 and 1011. 001100 appears, and otherwise, $P^m$ is 0, where m is the time index. The polarity of $(E[RV_0] - E[RV_{180}])$ can be estimated by $\sup \sum_{m=0}^{7} P^m[(ED_0^m \cdot ED_{180}^m) - (ED_0^m \cdot ED_{180}^m)]$ . The parameter $R_S^n$ is updated once every eight clock cycles, so m ranges from 0 to 7. Although $R_S$ is updated to adjust the boosting gain, the CTLE's zero is also altered. The adaptation loop will adjust the degenerated capacitor $C_S$ which alters the CTLE's zero by $$C_{\mathrm{S}}^{n+1} = C_{\mathrm{S}}^{n} + \Delta_{2}$$ $$\cdot \operatorname{sign} \left\{ \sum_{m=0}^{7} P^{m} \left[ \left( E D_{0}^{m} \cdot \overline{E D_{180}^{m}} \right) - \left( \overline{E D_{0}^{m}} \cdot E D_{180}^{m} \right) \right] \right\}$$ (4) where $\Delta_2$ is a step size. The adaptation loop will update $R_S$ and $C_S$ to find the best solution. Although the first step only works for the low-rate patterns, the second step considers the high-rate ones; such as 1101 and 1011 to minimize the ISI. As shown in Fig. 4, the bimodal jitter distribution for patterns 1101 and 1011 splits at the zero-crossing edges. Suppose that the bimodal jitter distribution is composed of two unimodal distributions with the same standard deviation $\sigma$ but different means $\mu_1$ and $\mu_2$ . According to [9], the bimodal jitter distribution becomes unimodal with the mean of $\mu$ if $$|\mu_1 - \mu_2| < 2\sigma. \tag{5}$$ For the histograms of patterns 1011 and 1101 to have an unimodal distribution, their respect means $\mu_1$ and $\mu_2$ must fall within the intervals $\mu\sigma$ and $\mu + \sigma$ . Due to the precursors of the identical third and fourth bits, the zero-crossing point for pattern 1011 is moved early. According to the Gaussian distribution property, while $\mu_1 = \mu - \sigma$ , the zero-crossing probabilities of CK Late and Early are equal to 84% and 16%, respectively, with respect to $\mu$ . The greater the disparity between the probabilities, the larger the bandwidth the CTLE requires. If the zero-crossing probability ratio between CK Late and Early is less than 5.25 and larger than 5.25 for pattern 1011, then the bandwidth of the CTLE will be decreased and increased, respectively. For pattern 1101 with respect to $\mu_2$ , a similar method is utilized. Note that the zero-crossing probabilities of CK Early and Late are calculated using the truth table of Fig. 1(b). As long as (5) is satisfied, the unimodal distribution is attained to decrease the DDJ. The implementation will be described in detail in Section III. Fig. 5. Timing margin of CK180 is (a) small for an improper VH and (b) large for a proper VH. When the CTLE is calibrated properly, Fig. 5(a) and (b) shows the same eye diagrams in which the timing margin of $CK_{180}$ is small with an improper $V_H$ and large with a proper $V_H$ , respectively. To achieve the maximum timing margin of $CK_{180}$ , $V_H$ must lie at the intersection of patterns 011 and 110, as shown in Fig. 5(b). By adding an additional comparator clocked by $CK_{135}$ with $V_H$ , the sampled data $DH_{135}$ is generated. The control signal $P_{110}^m$ is 1 for a valid pattern 110, and otherwise, it equals 0, where m is the time index and m = 0–7. The gradient to update $V_H$ is estimated by $sign\{\sum_{m=0}^{7} P_{110}^m[DH_{135}^m - D\bar{H}_{135}^m]\}$ . Thus, $V_H$ is updated as follows: $$V_{\rm H}^{n+1} = V_{\rm H}^n + \Delta_3 \cdot \text{sign} \left\{ \sum_{m=0}^7 P_{110}^m \left[ DH_{135}^m - D\bar{H}_{135}^m \right] \right\}$$ (6) where $\Delta_3$ is a step size. A well-equalized SBR is symmetric, meaning that $h_{-1}$ is equal to $h_{+1}$ . Since $h_{-1}$ and $h_{+1}$ are canceled, the amplitude of the middle bit of the valid pattern 110 is equal to the main cursor $h_0$ . Consequently, the ideal reference voltage $V_{\rm H}$ can be equal to $h_0$ . Note that if the eye diagram at the CTLE's output is over equalized, the calibration of $V_{\rm H}$ may degrade the eye height. ## III. CIRCUIT DESCRIPTION Fig. 6 shows the proposed THBR adaptive RX and the measured channel response ( $S_{21}$ ). This RX consists of a CTLE, nine comparators, a digitally controlled oscillator (DCO), a divide-by-8 divider, nine 1:8 DMUXes, a resistive-ladder digital-to-analog converter (RDAC), a 7-bit reference voltage generator (RVG) [12], and a synthesized logic circuit, respectively. The DCO [3] is composed of four delay stages, and every stage consists of two inverters, a latch and varactors. The varactors are controlled by the RDAC with a 127-bit thermometer code. In the clock path, simple buffers and dummies are added to match the delays. Moreover, iterative postsimulations are used to minimize the clock skew and the coupling effects. The eight comparators [10] with three reference voltages function as the PD and data recovery [7], while the additional comparator is used to adjust the reference voltage $V_{\rm H}$ . To mitigate the offset voltage of the comparators, a large input differential pair is used to reduce the mismatch. The THBR CDR circuit needs the data with a specific ISI or channel loss. The comparators with high sensitivity are needed which may increase the power consumption. The synthesized circuit is composed of a reference voltage adaptation circuit for the comparators, a CTLE adaptation circuit, a PD lock detector (PDLD), a PD, a digital loop filter (DLF), a binary-to-thermometer converter (B2T), and a data decoder with a pseudorandom binary sequence (PRBS) checker [11]. The PRBS checker monitors 32-bit recovered data, and its output signal error is used to calculate the BER. The PDLD monitors the probability difference between *Early* and *Late* of PD, activating the two adaptation circuits via the signal lock to prevent adaptation malfunction. The RDAC and the 7-bit RVG [12] generate the differential threshold voltages $+V_{\rm H}$ , 0, and $-V_{\rm H}$ for the CTLE and nine comparators, respectively. This adaptive RX is simulated under the supply with variations of $\pm 10\%$ , the devices with five process corners, and the temperature range of 70 °C–130 °C. The simulation results show that the adaptive RX works under the above conditions. # A. Continuous-Time Linear Equalizer (CTLE) The measured channel response is shown in Fig. 6, in which its slope is approximately 12 dB/Octave. To compensate for this channel loss, a three-stage CTLE is shown in Fig. 7(a). The first stage of the CTLE employs the degenerated resistor $R_{\rm S}$ and the degenerated capacitor $C_{\rm S}$ . The voltage $V_{R\rm S}$ controls the nMOS transistor, which realizes the degenerated resistor. Two varactors, controlled by the voltage $V_{CS}$ , are utilized to realize the degenerated capacitor. A common-mode feedback (CMFB) circuit controls the bias current. To compensate for the high frequency loss, the second stage employs an active inductor [13]. The voltage-controlled resistor in the active inductor is realized by the pMOS transistor which is controlled by the voltage $V_{Ld}$ . The CMFB circuit from the first stage also controls the bias current of the second stage. A differential pair consisting of a degenerated resistor, a degenerated capacitor, and a CMFB circuit realizes the third stage of the CTLE. The RDAC generates the voltages $V_{Rs}$ , $V_{Cs}$ , and $V_{Ld}$ which are controlled by $C_{Rs}[14:12]$ , $C_{Cs}[14:12]$ , and $C_{Ld}[3:0]$ , respectively. When $V_{RS}$ is altered from 687.5 to 950 mV, the simulated low-frequency gain of the CTLE changes from -6.30 to -1.11 dB. Moreover, while $V_{CS}$ is changed from 50 to 750 mV, the zero is shifted from 0.8 to 3 GHz at $C_{Rs}[14:12] = 3'$ b000 (i.e., $V_{RS} = 687.5$ mV). When $C_{Rs}[14:12] = 3' \text{ b011} \text{ and } C_{Cs}[14:12] = 3' \text{ b011}, C_{Ld}[3:0]$ is swept from 4' b0000 to 4' b1111, and $V_{Ld}$ is increased from 50 to 425 mV. Fig. 7(b) shows the SBR with varying $C_{C_8}[14:12]$ and fixed $C_{R_8}[14:12] = 3'$ b111. By decreasing $C_{Cs}$ [14:12], the zero is descended to decrease the postcursors $h_{+2.5}$ and $h_{+3.5}$ and increase the peak amplitude of the SBR. Fig. 7(c) shows the SBR with varying $C_{Rs}[14:12]$ and a fixed $C_{C_8}[14:12] = 3'$ b111. By decreasing $C_{R_8}[14:12]$ , the boosting factor is increased to eliminate the postcursors with slightly lowering the peak amplitude. Note that by altering the zero alone, $h_{+2.5}$ and $h_{+3.5}$ cannot be removed entirely. The boosting gain of the CTLE is also adjusted. That is the reason why the proposed adaptation method adjusts two parameters $C_S$ and $R_S$ of the CTLE. Fig. 7(d) shows Fig. 6. Proposed THBR adaptive RX and the measured channel response. Fig. 7. (a) Three-stage CTLE. (b) SBR with variable CCs[14:12] and a fixed CRs[14:12] = 3' b111. (c) SBR with variable CRs[14:12] and a fixed CCs[14:12] = 3' b111. (d) Overall simulated frequency response of the channel and the CTLE sweeping CLd[3:0] at CRs[14:12] = 3' b011 and CCs[14:12] = 3' b011. the simulated overall frequency response of the CTLE and the channel. By increasing $C_{Ld}[3:0]$ , the overall bandwidth is extended. When $C_{Rs}[14:12] = 3'$ b011, $C_{Cs}[14:12] = 3'$ b011, and $C_{Ld}[3:0] = 4'$ b1100, the simulated bandwidth can reach $2f_{NQ}/3$ , i.e., 12 GHz for a 36-Gb/s NRZ input data. To keep the input swing of the comparators for different channels, the variable gain amplifier (VGA) should be added. The adaptation circuits will be discussed in detail as follows. ## B. CTLE Adaptation Circuit Fig. 8(a) shows the CTLE adaptation circuit in the first step. When the pattern 001100 is detected by the pattern filter to be true, the control signal P is 1; otherwise, it is 0. In accordance with P[7:0], $ED_0[7:0]$ , and $ED_{180}[7:0]$ , eight bitwise operators produce eight 4-bit one-hot codes, [a b c d]. That is, the operation using $P^m$ , $ED_0^m$ , and $ED_{180}^m$ in (3) and (4) can be represented by eight 4-bit one-hot codes. Similar to (3) and (4), the codes $C_{Rs}[14:0]$ and $C_{Cs}[14:0]$ are updated to control $R_{\rm S}$ and $C_{\rm S}$ via the RDAC. To reduce switching transient interference, the bandwidth is much less than 1 MHz, and the step size $\gamma$ (= 1/1024) is chosen. In addition, the first stage CTLE is controlled by the first three MSBs of $C_{Rs}$ [14:0] and $C_{C_s}[14:0]$ . To estimate the convergence time, assume the sequence 001100 appears only once for a 36-Gb/s PRBS of $2^7 - 1$ . The time interval to update one LSB for $C_{C_8}[14:12]$ and $C_{Rs}[14:12]$ is calculated to be around 57 $\mu$ s. Then, the worst-case convergence time is 57 $\mu$ s multiplied by 7 which is around 0.4 ms. Fig. 8(b) shows the CTLE adaptation circuit in the second step. For the patterns 1011 and 1101, two probability calculators (PCs), PC1 and PC2, are used. Once pattern 1011 is true, the control signal $P_{1011}$ is 1, and otherwise, it is 0. By using $P_{1011}[15:0]$ , ED<sub>0</sub>[7:0], ED<sub>180</sub>[7:0], and 16 bitwise operators, the PC1 generates the signals Early[15:0] and Late[15:0]. Then, an accumulator (ACC) accumulates Early [15:0] and Late[15:0] to produce the outputs $N_{\rm E}$ and $N_{\rm L}$ , respectively. The total sample number $N_{\rm T}$ is defined as the sum of $N_{\rm E}$ and $N_{\rm L}$ . When $N_{\rm T} \geq 8192$ , if $$N_{\rm E} \le 1311 \text{ and } N_{\rm L} \ge 6881.$$ (7) Fig. 8. CTLE adaptation circuit in (a) first step and (b) second step. The ratio of $N_{\rm L}$ and $N_{\rm E}$ is larger than 5.25. It indicates that the zero-crossing edge is distant from the mean, and the bandwidth of the CTLE should be increased to avoid the bimodal distribution. Note that $N_{\rm T} \geq 8192$ is chosen for an error margin of 1% and a confidence interval of 99% [14]. If the ratio of $N_{\rm E}$ and $N_{\rm L}$ is less than 5.25, the bandwidth of the CTLE should be reduced. Moreover, while $N_{\rm T} \geq 8192$ , the ACC is reset. For the pattern 1101, the PC2 is utilized. When $N_T \ge 8192$ , if $$N_{\rm E} \ge 6881 \text{ and } N_{\rm L} \le 1311.$$ (8) The ratio of $N_{\rm E}$ and $N_{\rm L}$ is larger than 5.25. Similarly, the bandwidth of the CTLE should be increased to prevent bimodal distribution. If the ratio of $N_{\rm L}$ and $N_{\rm E}$ is less than 5.25, the bandwidth of the CTLE should be reduced. When $N_{\rm T} \geq 8192$ , (7) and (8) are true, $C_{\rm Ld}[3:0]$ is updated to increase the CTLE's bandwidth and vice versa. A lock-and-frozen method is implemented in the CTLE adaptation circuit to avoid bouncing back and forth once the loops converge. The LSBs of $C_{\rm Cs}[14:12]$ , $C_{\rm Rs}[14:12]$ , and $C_{\rm Ld}[3:0]$ are monitored. Once the LSBs bounce back and forth, the adaptation process will be hold. For a 36-Gb/s PRBS of $2^7 - 1$ with the channel response in Fig. 6 and the CTLE in Fig. 7(a), Fig. 9(a) shows the simulated eye diagram at the CTLE's input. With the first-step adaptation, Fig. 9(b) shows the simulated eye diagram at the CTLE's output with $C_{Rs}[14:12] = 3'$ b011, $C_{Cs}[14:12] = 3'$ b011, and $C_{Ld}[3:0] = 4'$ b0001. The zero-crossing edges are splitting to form the bimodal distribution. Incorporating both the first-step and second-step adaptations, Fig. 9(c) shows the simulated eye diagram at the CTLE's output with $C_{Rs}[14:12] = 3'$ b011, $C_{Cs}[14:12] = 3'$ b011, and $C_{Ld}[3:0] = 4'$ b1100. Compared with Fig. 9(b), the DDJ is reduced and the eye width is improved by 0.17 UI in Fig. 9(c). Fig. 9(d) shows the frequency responses with and without adaptations. The simulated bandwidth is up to 12 GHz. Fig. 9. Simulated eye diagrams at (a) CTLE input, (b) with the first-step adaptation, (c) with both the first- and second-step adaptations, and (d) simulation frequency responses with and without adaptations. ## C. Reference Voltage Adaptation Circuit According to the simulation results of Fig. 9(c), the initial differential reference voltage can be set between 200 and 500 mV<sub>dpp</sub>. While adjusting the CTLE, the signal swing may be changed. To ensure the robustness of the reference voltage adaptation, the initial differential reference voltage is chosen as 250 mV<sub>pp</sub>. Fig. 10(a) and (b) shows the adaptation circuit and its adaptation flow, respectively, for the reference voltages. According to the adaption flow, the control signal $P_{110}$ is 1, when the pattern 110 is validated by the pattern filter and otherwise it equals 0. By using DH<sub>135</sub>[7:0] and $P_{110}$ [7:0], the adaptation circuit uses eight bitwise operators to generate the signals INC and DEC. The difference between INC and DEC is then accumulated, where the sign function determines the polarity. Similar to (6), the code $C_{VH}[19:0]$ is updated as shown in Fig. 10(a), where $\gamma_{RV}$ is an updated step. The first seven MSBs $C_{VH}$ [19:13] control the RVG to generate the differential reference voltages $\pm V_{\rm H}$ and 0. The simulated settling time of the RVG is approximately 125 ns. The system clock of 1.125 GHz is utilized for the adaption circuit. To consider the settling time of the RVG, the step size $\gamma_{RV}$ of 1/512 is chosen to limit the updated period to 456 ns ( $\sim$ 512/1.125 GHz). When the PDLD is locked, a time-out detector is used to monitor $C_{VH}$ [19:13]. Assuming that this code is fixed within 2048 clock cycles, it is frozen to control the RVG. ## IV. EXPERIMENTAL RESULTS Fig. 11(a) shows the die photo of the proposed adaptive RX and its layout. It is fabricated in 28-nm CMOS technology and has an active area of 0.097 mm<sup>2</sup>. The supply voltages of 0.8 and 1 V are for the synthesized logic circuit and the remaining ones, respectively. Once the adaption is completed, Fig. 10. (a) Adaptation circuit and (b) adaptation flow for the reference voltage. Fig. 11. (a) Die photograph and its layout, and the power breakdown when the clock of the adaption circuits is (b) gated and (c) not gated. the clock of the adaptation circuits is gated to save the power. Fig. 11(b) and (c) shows the power breakdown of the adaptive RX, while the clock of the adaptation circuits is gated versus when it is not gated, respectively. The total power dissipation of the adaptive RX is 76 mW with the gated adaptation circuits for the 36 Gb/s PRBS of $2^7 - 1$ . To further reduce the power consumption, the multiple supply voltages can be used [15], [16]. For example, the DMUX and the divider can adopt the sub-1-V supply if the speed is allowed. In addition, the digital circuit can use the multi- $V_{\rm th}$ devices to minimize the leakage current. The measurement setup is shown in Fig. 12. A 36-Gb/s data are generated by Keysight M8040A through a 16.65" M8049A-003 ISI board with the cables and the connectors. The measured channel response of Fig. 6 shows the channel Fig. 12. Measurement setup. Fig. 13. Measured CTLE adaptation transient responses for CCs[14:12], CRs[14:12], and CLd[3:0] of the CTLE and the signal error of the PRBS checker. loss at 18 GHz is -19 dB. The descrialized recovery data and the divide-by-8 recovery clock are measured by the sampling oscilloscope Keysight 86118A and the phase noise analyzer R&S FSWP. To measure the transient responses of the adaptation circuits, $C_{Cs}[14:12]$ , $C_{Rs}[14:12]$ , $C_{Ld}[3:0]$ , and $C_{VH}[19:13]$ are converted into analog voltages by using the multiplexer and an on-chip 6-bit digital-to-analog converter (DAC) with a full scale of 400 mV. The output signal error of the PRBS checker and the DAC output voltage, DAC O/P, are measured by the oscilloscope Tektronix MDO3104. Fig. 13 shows the adaptive CTLE transient responses. The initial conditions of $C_{C_8}[14:12]$ , $C_{R_8}[14:12]$ , and $C_{L_d}[3:0]$ are 3' b111, 3' b111, and 3' b0001, respectively. The remaining LSBs of the 6-bit DAC are all equal to 0. Before the CTLE adaptation, the measured initial BER is around 0.5. In the first step, $C_{Cs}[14:12]$ and $C_{Rs}[14:12]$ are converted by the onchip 6-bit DAC. The measured settling time is approximately 500 $\mu$ s in the first step. In the second step, $C_{Ld}[3:0]$ is active. The total measured settling time is approximately 1.5 ms. The final values for $C_{CS}[14:12]$ , $C_{RS}[14:12]$ , and $C_{Ld}[3:0]$ are equal to 3' b011, 3' b100, and 4' b1011, respectively. A counter is used to continuously count the number of error bits in the PRBS checker. When the LSB of this counter keeps unchanged (i.e., not toggling), the signal error is kept as either logic 1 or logic 0. Thus, the error is fixed as either logic 1 or 0, and an error-free case is found. In Fig. 13, the error stops to toggle at 1.4 ms roughly. Fig. 14. Measured transient responses of the reference voltage adaptation and the signal error of the PRBS checker for the data with the SJ magnitudes of (a) 0.2 and (b) 0.4 UIpp and the jitter modulation frequency of 20 MHz. Fig. 15. (a) Measured eye diagram of the single-ended output of the channel in Fig. 6 with the 36-Gb/s PRBS of $2^7 - 1$ . (b) Eye diagram of the descrialized by 32 recovered data at 1.125 Gb/s. Fig. 16. (a) Eye-diagram and (b) phase noise of the divide-by-8 recovered clock at 1.125 GHz. Fig. 14(a) and (b) shows the measured transient responses of the reference voltage adaptation for the data with an SJ magnitude of 0.2 and 0.4 $\rm UI_{pp}$ , respectively, and a jitter modulation frequency of 20 MHz. The quarter-rate THBR CDR circuit has a bandwidth of approximately 10 MHz. The jitter modulation frequency of 20 MHz is chosen which cannot be accurately tracked by the THBR CDR circuit. In Fig. 14, $C_{VH}$ [19:14] is converted by the on-chip 6-bit DAC. For the SJ magnitude of 0.2 UI<sub>pp</sub>, Fig. 14(a) shows the measured settling time of 0.5 ms. The output signal error of the PRBS checker ceases to toggle at 0.25 ms. Using an SJ magnitude of 0.4 UI<sub>pp</sub>, Fig. 14(b) shows the measured settling time is 0.85 ms and error ceases to toggle at 0.75 ms. For Fig. 14(a) and (b), the final analog DAC O/P converges to 145 mV. The final reference voltage $V_{\rm H}$ of the comparators converges and the recovered clock tolerates the data with the SJ. | | [2] | [3] | [4] | [15] | [16] | This work | |------------------------------------|-------------------------|---------------------|-------------------------|----------------------------|---------------------|---------------------| | Technology | 28-nm | 40-nm | 28-nm | 22-nm | 65-nm | 28-nm | | Supply [V] | 0.9 | 1.2/1 | 0.9 | 1.15/1.1/0.85 | 1.18/1.1/1 | 1/0.8 | | Data-rate | 36 Gb/s | 20 Gb/s | 30 Gb/s | 16 Gb/s | 10.8 Gb/s | 36 Gb/s | | Channel Loss<br>@Nyq. Freq. | -18.25 dB<br>@18GHz | -10.31 dB<br>@10GHz | 13.06 dB<br>@15GHz | -34.1 dB<br>@5.4GHz | -34 dB<br>@5.4GHz | -19 dB<br>@18GHz | | Equalizer | CTLE<br>DFE | DFE | CTLE | CTLE<br>DFE | CTLE<br>DFE | CTLE | | Equalizer<br>Adaptation | Sequential<br>Search | Eye-balancing | N/A | Manually<br>Eye Monitoring | SSLMS | Pattern-based | | Baud-Rate | Yes | Yes | Yes | Yes | No | Yes | | Need Extra<br>Clock Phase | Yes | Yes | N/A | No | No | No | | Pattern | PRBS7 | PRBS7 | PRBS7 | PRBS31 | PRBS7 | PRBS7 | | BER | < 10 <sup>-12</sup> | < 10 <sup>-12</sup> | < 10 <sup>-12</sup> | < 10 <sup>-12</sup> | < 10 <sup>-12</sup> | < 10 <sup>-12</sup> | | High-freq. Min<br>Jitter Tolerance | 0.15 <sup>†</sup> -UIpp | 0.26-UIpp | 0.13 <sup>†</sup> -UIpp | 0.25-UI | 0.1-UIpp | 0.1-UIpp | | Power [mW] | 106.3 | 55.4 | 79.2 | 59.7 | 37.2 | 76 | | FoM <sub>1</sub> [pJ/b] | 3.04 | 2.77 | 2.64 | 3.7 | 3.4 | 2.11 | | FoM <sub>2</sub> [pJ/bit/dB] | 0.17 | 0.27 | 0.2 | 0.1 | 0.1 | 0.1 | TABLE I PERFORMANCE SUMMARY AND COMPARISON $FoM_1 = Power/Data$ -rate. $FoM_2 = FoM_1/Channel Loss @ Nyquist frequency.$ These adaption loops are turned on sequentially. To adapt the CTLE, it relies on the locked PD. Thus, the adaption loops for the CTLE are active first. While the CTLE adaption loops are complete, the reference voltage adaption loop is then activated. The estimated bandwidth of the CDR circuit, the CTLE adaption loops, and the reference voltage adaption loop are 10 MHz, 1 kHz, and 2 kHz, respectively. Fig. 15(a) shows the measured eye diagram for the single-ended output of the channel in Fig. 6 with the 36-Gb/s PRBS of 2<sup>7</sup> – 1. Fig. 15(b) shows the eye diagram of the deserialized by 32 recovered data at 1.125 Gb/s. The measured root-mean-square (rms) and peak-to-peak jitter values are 4.14 and 26.66 ps<sub>pp</sub>, respectively, at 1.125 Gb/s. Fig. 16(a) shows the divide-by-8 recovered clock at 1.125 GHz. The measured rms and peak-to-peak jitter values are 2.48 ps<sub>rms</sub> and 16 ps<sub>pp</sub>, respectively, at 1.125 GHz. Since the data are coupled to the clock buffers, the magnitudes of the recovered clock are distorted. Fig. 16(b) shows the measured phase noise of the divideby-8 recovered clock at 1.125 GHz. The integrated rms jitter is 583.6 fs<sub>rms</sub>, while the offset frequency is integrated from 100 Hz to 100 MHz. Since this THBR CDR circuit has a loop latency of 128 UI, it degrades the phase margin which leads to a significant peaking at around 20 MHz in Fig. 16(b). Fig. 17 shows the measured JTOL with and without the reference voltage adaptation. While BER $< 10^{-12}$ , the high-frequency and minimum tolerable magnitudes are 0.1 and 0.07 UI<sub>pp</sub> with and without the reference voltage adaptation, respectively. The significant improvement of the tolerable magnitudes is 0.05 UI<sub>pp</sub> at the jitter frequency at 40 MHz. However, due to the factors, such a long loop latency, the low PD's gain, and the hunting jitter, the measured BER under the PRBS of 2<sup>15</sup> -1 is within $10^{-11}$ and $10^{-12}$ . Table I lists the comparison of our work with the BR CDR circuits [2], [3], [4], [15] and Fig. 17. Measured JTOL with and without the reference voltage adaptation. an oversampling one [16]. Compared with [2], [3], and [4], our work achieves a good figure-of-merit (FoM). Compared with [15] and [16], our data rate is higher with a similar $FoM_2$ of 0.1 pJ/b/dB. ## V. CONCLUSION The pattern-based adaptation method is presented to adjust the frequency response of the CTLE for a 36-Gb/s THBR RX. The reference voltage of the comparators is also tweaked to enhance the timing margin of the recovered clock after the adaptation of CTLE. The adaptation circuits for both the CTLE and the reference voltage of the comparators are illustrated. Neither the PI [2] nor the additional multiphase clock [3] are necessary. Finally, the energy efficiency of this THBR adaptive RX is enhanced. This THBR adaptive RX needs a specific <sup>†</sup> Estimated from the figure. input signal which only contains the first precursors and first postcursors. To consider the nonlinear channel with reflection, crosstalk, and high loss, the decision-feedback equalizer with multiple taps can be added. #### REFERENCES - [1] T. Shibasaki et al., "3.5 A 56Gb/s NRZ-electrical 247 mW/lane serial-link transceiver in 28nm CMOS," in *IEEE Int. Solid-State Circuits Conf.* (*ISSCC*) Dig. Tech. Papers, Jan. 2016, pp. 64–65. - [2] D. Yoo, M. Bagherbeik, W. Rahman, A. Sheikholeslami, H. Tamura, and T. Shibasaki, "A 36-Gb/s adaptive baud-rate CDR with CTLE and 1-tap DFE in 28-nm CMOS," *IEEE Solid-State Circuits Lett.*, vol. 2, no. 11, pp. 252–255, Nov. 2019. - [3] W.-M. Chen, Y.-S. Yao, and S.-I. Liu, "A 20-Gb/s jitter-tolerance-enhanced digital CDR with one-tap DFE," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 69, no. 3, pp. 894–898, Mar. 2022. - [4] D. Yoo, M. Bagherbeik, W. Rahman, A. Sheikholeslami, H. Tamura, and T. Shibasaki, "A 30Gb/s 2x half-baud-rate CDR," in *Proc. IEEE Custom Integr. Circuits Conf. (CICC)*, Apr. 2019, pp. 1–4. - [5] Y.-H. Kim, Y.-J. Kim, T. Lee, and L.-S. Kim, "A 21-Gbit/s 1.63-pJ/bit adaptive CTLE and one-tap DFE with single loop spectrum balancing method," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 24, no. 2, pp. 789–793, Feb. 2016. - [6] S. Shahramian et al., "30.5 A 1.41pJ/b 56Gb/s PAM-4 wireline receiver employing enhanced pattern utilization CDR and genetic adaptation algorithms in 7nm CMOS," in *IEEE Int. Solid-State Circuits Conf.* (ISSCC) Dig. Tech. Papers, Feb. 2019, pp. 482–484. - [7] Y.-H. Lan and S.-I. Liu, "A 0.079-pJ/b/dB 32-Gb/s 2× half-baud-rate CDR circuit with frequency detector," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 32, no. 4, pp. 704–713, Apr. 2024. - [8] B. Razavi, "The design of an equalizer," *IEEE Solid-State Circuits Mag.*, vol. 13, no. 4, pp. 7–11, Fall 2021. - [9] Multimodal Distribution. Accessed: Feb. 2023. [Online]. Available: https://en.wikipedia.org/wiki/Multimodal\_distribution - [10] W.-M. Chen, Y.-S. Yao, and S.-I. Liu, "A 10.4–16-Gb/s reference-less baud-rate digital CDR with one-tap DFE using a wide-range FD," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 68, no. 11, pp. 4566–4575, Nov. 2021. - [11] Y.-C. Huang, P.-Y. Wang, and S.-I. Liu, "An all-digital jitter tolerance measurement technique for CDR circuits," *IEEE Trans. Circuits Syst. II*, *Exp. Briefs*, vol. 59, no. 3, pp. 148–152, Mar. 2012. - [12] M.-S. Chen, Y.-N. Shih, C.-L. Lin, H.-W. Hung, and J. Lee, "A fully-integrated 40-Gb/s transceiver in 65-nm CMOS technology," *IEEE J. Solid-State Circuits*, vol. 47, no. 3, pp. 627–640, Mar. 2012. - [13] B. Razavi, "Active inductor," *IEEE Solid-State Circuits Mag.*, vol. 12, no. 2, pp. 7–11, Spring 2020. - [14] W.-S. Kim, C.-K. Seong, and W.-Y. Choi, "A 5.4-Gbit/s adaptive continuous-time linear equalizer using asynchronous undersampling histograms," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 59, no. 9, pp. 553–557, Sep. 2012. - [15] P. A. Francese et al., "A 16 Gb/s 3.7 mW/Gb/s 8-tap DFE receiver and baud-rate CDR with 31 kppm tracking bandwidth," *IEEE J. Solid-State Circuits*, vol. 49, no. 11, pp. 2490–2502, Nov. 2014. - [16] J. Lee, K. Lee, H. Kim, B. Kim, K. Park, and D.-K. Jeong, "A 0.1pJ/b/dB 1.62-to-10.8Gb/s video interface receiver with fully adaptive equalization using un-even data level," in *Proc. Symp. VLSI Circuits*, Jun. 2019, pp. C198–C199. Yi-Hao Lan was born in Taichung, Taiwan, in 1998. He received the B.S. degree in electrical engineering from National Sun Yat-sen University (NSYSU), Kaohsiung, Taiwan, in 2020, and the M.S. degree from the Graduate Institute of Electronic Engineering, National Taiwan University (NTU), Taipei, Taiwan, in 2023. In 2023, he joined MediaTek, Hsinchu, Taiwan, in the development of high-speed SERDES. His research interests include high-speed clock and data recovery circuits (CDRs), equalizer circuits, and mixed-signal IC design. **Shen-Iuan Liu** (Fellow, IEEE) was born in Keelung, Taiwan, China, 1965. He received the B.S. and Ph.D. degrees in electrical engineering from National Taiwan University (NTU), Taipei, Taiwan, in 1987 and 1991, respectively. From 1991 to 1993, he served as a Second Lieutenant with Chinese Air Force, Taichung, Taiwan. From 1991 to 1994, he was an Associate Professor with the Department of Electronic Engineering, National Taiwan Institute of Technology, Taipei. He joined the Department of Electrical Engineering, NTU, in 1994, where he has been a Professor since 1998 and a Distinguished Professor since August 2010. He was the Director of the Graduate Institute of Electronics Engineering, NTU, from 2013 to 2016. His research interests include analog and digital integrated circuits and systems. Dr. Liu was a recipient of the Engineering Paper Award from Chinese Institute of Engineers in 2003, the Young Professor Teaching Award from MXIC Inc., the Research Achievement Award from NTU, the Outstanding Research Award from the National Science Council in 2004, the Outstanding Research Award from the Ministry of Science and Technology in 2014, and the Best Paper Awards at the 2020 and 2021 International Symposium on VLSI Design, Automation and Test and the 2023 International VLSI Symposium on Technology, Systems and Applications. He achieved the Teaching Excellence Award from NTU in 2022. He was awarded the Himax Chair Professorship at NTU in 2010. He has served as the Chair for the IEEE Solid-State Circuits Society (SSCS) Taipei Chapter from 2004 to 2008, which achieved the Best Chapter Award in 2009. He has served as the General Chair for the 15th VLSI Design/CAD Symposium, Taiwan, in 2004, and as a Program Co-Chair for the Fourth IEEE Asia-Pacific Conference on Advanced System Integrated Circuits, Fukuoka, Japan, in 2004. He has served as a Technical Program Committee Member for the IEEE International Solid-State Circuits Conference (ISSCC) from 2006 to 2008, IEEE VLSI-DAT from 2008 to 2012, and Asian Solid-State Circuits Conference (A-SSCC) from 2005 to 2012. He also served as the Technical Program Committee Co-Chair and the Chair for A-SSCC in 2010 and 2011, respectively. He was an Associate Editor of the IEEE JOURNAL OF SOLID-STATE CIRCUITS from 2006 to 2009 and a Guest Editor for a Special Issue of the IEEE JOURNAL OF SOLID-STATE CIRCUITS in December 2008 and November 2012. He was an Associate Editor of IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: EXPRESS BRIEFS from 2006 to 2007 and IEEE TRANSACTIONS ON CIRCUITS AND SYS-TEMS-I: REGULAR PAPERS from 2008 to 2009. He was the Editorial Board Member of Research Letters in *Electronics* from 2008 to 2009. He was an Associate Editor of The Institute of Electronics, Information and Communication Engineers (IEICE) Transactions on Electronics from 2008 to 2011. He has been an Associate Editor of the ETRI Journal and the Journal of Semiconductor Technology and Science, South Korea, since 2009.